-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Serialization Logic #82
Conversation
0b5f901
to
f58127e
Compare
As a part of the python3 conversion, we added logic to try to convert bytestrings to unicode strings. This had a problem because not all bytestrings that are produced as a part of the context passed into and out of the jailed code execution is actually unicode strings. In some cases this was resulting in a unicode decode error. Before this code existed we used to just drop any nonstandard types and also only keep complex types(list,dict,tuples) that were easily jsonable. In python2 this was fine because byte strings would automatically get coerced to unicode strings. However in python3 this will throw errors. So if we don't explicitly try to convert all the byte strings to unicode, we could stop passing back data that we previously used to pass in. The first iteration of this code assumed all byte arrays were unicode strings. From evidence in production this is not always the case. So we fall back to the behaviour we had before this change where if an item can't successfully be converted to something we can serialize to json, we just drop that item from the list of content being passed back through the serialization straw.
f58127e
to
d84f922
Compare
codejail/safe_exec.py
Outdated
@@ -22,7 +22,7 @@ | |||
# Flags to let developers temporarily change some behavior in this file. | |||
|
|||
# Set this to True to log all the code and globals being executed. | |||
LOG_ALL_CODE = False | |||
LOG_ALL_CODE = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this change should be reverted before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
codejail/safe_exec.py
Outdated
try: | ||
new_value = filter_unserializable(v) | ||
if jsonable(new_value): | ||
new_dict[k] = new_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the key comes in as bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add that case.
codejail/safe_exec.py
Outdated
|
||
ok_types = (type(None), int, float, str, six.text_type, list, tuple, dict) | ||
serializable_dict = filter_unserializable(d) | ||
#serializable_dict = d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to keep this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove.
codejail/tests/test_safe_exec.py
Outdated
globs | ||
) | ||
from pprint import pprint | ||
pprint(globs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this just for debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
As a part of the python3 conversion, we added logic to try to convert
bytestrings to unicode strings. This had a problem because not all
bytestrings that are produced as a part of the context passed into and
out of the jailed code execution is actually unicode strings. In some
cases this was resulting in a unicode decode error.
Before this code existed we used to just drop any nonstandard types and
also only keep complex types(list,dict,tuples) that were easily
jsonable. In python2 this was fine because byte strings would
automatically get coerced to unicode strings. However in python3 this
will throw errors. So if we don't explicitly try to convert all the
byte strings to unicode, we could stop passing back data that we
previously used to pass in.
The first iteration of this code assumed all byte arrays were unicode
strings. From evidence in production this is not always the case. So
we fall back to the behaviour we had before this change where if an item
can't successfully be converted to something we can serialize to json,
we just drop that item from the list of content being passed back
through the serialization straw.